Detecting Acronyms from Capital Letter Sequences in Spanish

نویسندگان

  • Rubén San-Segundo-Hernández
  • Juan Manuel Montero-Martínez
  • Verónica López-Ludeña
  • Simon King
چکیده

This paper presents an automatic strategy to decide how to pronounce a Capital Letter Sequence (CLS) in a Text to Speech system (TTS). If CLS is well known by the TTS, it can be expanded in several words. But when the CLS is unknown, the system has two alternatives: spelling it (abbreviation) or pronouncing it as a new word (acronym). In Spanish, there is a high relationship between letters and phonemes. Because of this, when a CLS is similar to other words in Spanish, there is a high tendency to pronounce it as a standard word. This paper proposes an automatic method for detecting acronyms. Additionaly, this paper analyses the discrimination capability of some features, and several strategies for combining them in order to obtain the best classifier. For the best classifier, the classification error is 8.45%. About the feature analysis, the best features have been the Letter Sequence Perplexity and the Average N-gram order.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lessons from a Community Telecenter in Southwestern Colombia

Developments in electronic communications have created many new uses for the letter “e”—“e-commerce,” “e-learning,” “e-governance,” and so forth. These terms point to some of the ways in which modern information and communications technologies (ICTs) are changing how millions of people work and live. Against that background, it makes sense for the International Center for Tropical Agriculture (...

متن کامل

On Runs in Independent Sequences

Given an i.i.d. sequence of n letters from a finite alphabet, we consider the length of the longest run of any letter. In the equiprobable case, results for this run turn out to be closely related to the well-known results for the longest run of a given letter. For coin-tossing, tail probabilities are compared for both kinds of runs via Poisson approximation.

متن کامل

Are acronyms really irregular? Preserved acronym reading in a case of semantic dementia.

This paper describes the progressive performance of JD, a patient with semantic dementia, on acronym categorisation, recognition and reading aloud over a period of 18 months. Most acronyms have orthographic and phonological configurations that are different from English words (BBC, DVD, HIV). While some acronyms, the majority, are regularly pronounced letter by letter, others are pronounced in ...

متن کامل

Translation of Acronyms, Initialisms and Abbreviations (AIA) in Persian Political and Sport Journalistic Texts

The different writing systems of English and Persian makes translation of acronyms, initialisms and abbreviations challenging. This study aimed at finding which strategies were applied most frequently in translating acronyms, initialisms and abbreviations from English to Persian especially in journalistic texts. The study was done based n Descriptive Translation Study of Toury and strategies pr...

متن کامل

Spanish recognizer of continuously spelled names over the telephone

In this paper we present a hypothesis-verification approach for a Spanish Recognizer of continuously spelled names over the telephone. We give a detailed description of the spelling task for Spanish where the most confusable letter sets are described. We introduce a new HMM topology with contextual silences incorporated into the letter model to deal with pauses between letters, increasing the L...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012